An Empirical Study of Differences between Conversion Schemes and Annotation Guidelines
نویسنده
چکیده
We establish quantitative methods for comparing and estimating the quality of dependency annotations or conversion schemes. We use generalized tree-edit distance to measure divergence between annotations and propose theoretical learnability, derivational perplexity and downstream performance for evaluation. We present systematic experiments with treeto-dependency conversions of the PennIII treebank, as well as observations from experiments using treebanks from multiple languages. Our most important observations are: (a) parser bias makes most parsers insensitive to non-local differences between annotations, but (b) choice of annotation nevertheless has significant impact on most downstream applications, and (c) while learnability does not correlate with downstream performance, learnable annotations will lead to more robust performance across domains.
منابع مشابه
An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملTowards Feasible Guidelines for the Annotation of Argument Schemes
The annotation of argument schemes represents an important step for argumentation mining. General guidelines for the annotation of argument schemes, applicable to any topic, are still missing due to the lack of a suitable taxonomy in Argumentation Theory and the need for highly trained expert annotators. We present a set of guidelines for the annotation of argument schemes, taking as a framewor...
متن کاملIdentifying Argumentation Schemes in Genetics Research Articles
This paper presents preliminary work on identification of argumentation schemes, i.e., identifying premises, conclusion and name of argumentation scheme, in arguments for scientific claims in genetics research articles. The goal is to develop annotation guidelines for creating corpora for argumentation mining research. This paper gives the specification of ten semantically distinct argumentatio...
متن کاملGender and Crime: An Empirical Test of General Strain Theory among Youth in Babol (A City in Northern Part of Iran)
This paper presents an attempt to use Agnew’s General Strain Theory ( GST) (1992) for explanation of the criminal behavior differences between young males and females in Babol, a city in northern part of Iran. General Strain Theory (GST) is essentially regarded as a set of ideas formulated to explain the occurrence of crime as a result of the strain in social life. This study explores th...
متن کاملChallenges in Converting between Treebanks: a Case Study from the HUTB
An important question for treebank development is whether high-quality conversion from one representation (e.g., dependency structure) to another representation (e.g., phrase structure) is possible, assuming that annotation guidelines exist for both representations. In this study, we demonstrate that the conversion is possible only under certain conditions, and even when the conditions are met,...
متن کامل